Learning of Latent Class Models by Splitting and Merging Components
نویسندگان
چکیده
A problem in learning latent class models (also known as naive Bayes models with a hidden class variable) is that local maximum parameters are often found. This leads not only to suboptimal parameters, but also to a wrong number of classes (components) for a hidden variable. The standard solution of having many random starting points for the EM algorithm is often too expensive computationally. We propose to obtain better starting points for EM by splitting and merging components in models with already estimated parameters. This way we extend our previous work, where only a component splitting was used and the need for a component merging was noticed. We relate a component merging to a model identifiability. We propose an algorithm that learns latent class models by performing a component splitting, merging, and an operation that combines these two. In the experiments with real-world datasets, our algorithm is often better than the standard one when models contain many components. We discuss possible extensions of our work, the most promising of them being to learn cardinalities and parameters of hidden variables in Bayesian networks.
منابع مشابه
An application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملClustering and combining pattern of metabolic syndrome components among Iranian population with latent class analysis
Background: Metabolic syndrome (MetS), a combination of coronary heart disease and diabetes mellitus risk factor, refer to one of the most challenging public health issues in worldwide. The aim of this study was to identify the subgroups of participants in a study on the basis of MetS components. Methods: The cross-sectional study took place in the districts related to Teh...
متن کاملبهکارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر همخطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...
متن کاملA Comparison of Numerical Splitting-Based Methods for Markovian Dependability and Performability Models
Iterative numerical methods are an important ingredient for the solution of continuous time Markov dependability models of faulttolerant systems. In this paper we make a numerical comparison of several splitting-based iterative methods. We consider the computation of steady-state reward rate on rewarded models. This measure requires the solution of a singular linear system. We consider two clas...
متن کامل以狄式分佈為基礎之多語聲學模型拆分及合併 (Multilingual Acoustic Model Splitting and Merging by Latent Dirichlet Allocation) [In Chinese]
To avoid the confusion of phonetic acoustic models between different languages is one of the most challenges in multilingual speech recognition. We proposed the method based on Latent Dirichlet Allocation to avoid the confusion of phonetic acoustic models between different languages. We split phonetic acoustic models based on tri-phone. And merging the group that selected by Latent Dirichlet Al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004